NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Investigating open reading frames in known and novel transcripts using ORFanage

https://doi.org/10.1038/s43588-023-00496-1

Varabyou, Ales; Erdogdu, Beril; Salzberg, Steven L.; Pertea, Mihaela (July 2023, Nature Computational Science)

Full Text Available
The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual

https://doi.org/10.1093/g3journal/jkac321

Chao, Kuan-Hao; Zimin, Aleksey V.; Pertea, Mihaela; Salzberg, Steven L.; Emerson, ed., J. J. (January 2023, G3: Genes, Genomes, Genetics)

Abstract We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.
more » « less
TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets

https://doi.org/10.1093/bioinformatics/btab342

Varabyou, Ales; Pertea, Geo; Pockrandt, Christopher; Pertea, Mihaela (May 2021, Bioinformatics)
Ponty, Yann (Ed.)
Abstract Summary Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input. Availability and implementation TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush). Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Effects of transcriptional noise on estimates of gene and transcript expression in RNA sequencing experiments

https://doi.org/10.1101/gr.266213.120

Varabyou, Ales; Salzberg, Steven L.; Pertea, Mihaela (February 2021, Genome Research)
null (Ed.)
Full Text Available
GFF Utilities: GffRead and GffCompare

https://doi.org/10.12688/f1000research.23297.1

Pertea, Geo; Pertea, Mihaela (January 2020, F1000Research)

Summary: GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license ( https://github.com/gpertea/gffread , https://github.com/gpertea/gffcompare ).
more » « less
Full Text Available
Transcriptome assembly from long-read RNA-seq alignments with StringTie2

https://doi.org/10.1186/s13059-019-1910-1

Kovaka, Sam; Zimin, Aleksey V.; Pertea, Geo M.; Razaghi, Roham; Salzberg, Steven L.; Pertea, Mihaela (December 2019, Genome Biology)

Full Text Available
ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest

https://doi.org/10.1371/journal.pcbi.1007556

Li, Jiajin; Jew, Brandon; Zhan, Lingyu; Hwang, Sungoo; Coppola, Giovanni; Freimer, Nelson B.; Sul, Jae Hoon; Pertea, Mihaela (December 2019, PLOS Computational Biology)

Full Text Available
Human contamination in bacterial genomes has created thousands of spurious proteins

https://doi.org/10.1101/gr.245373.118

Breitwieser, Florian P; Pertea, Mihaela; Zimin, Aleksey; Salzberg, Steven L (January 2019, Genome Research)

Full Text Available
Tximeta: Reference sequence checksums for provenance identification in RNA-seq

https://doi.org/10.1371/journal.pcbi.1007664

Love, Michael I.; Soneson, Charlotte; Hickey, Peter F.; Johnson, Lisa K.; Pierce, N. Tessa; Shepherd, Lori; Morgan, Martin; Patro, Rob; Pertea, Mihaela (February 2020, PLOS Computational Biology)

Full Text Available

Search for: All records